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Abstract 

The primary function of the National Space Science 
Data Center (NSSDC) is to provide the means for the dis- 
semination and analysis of space science data beyond that 
provided by the original experimenter. The Ceiiter is, 
therefore, not an operational data processing center. 
However, the recognition of important secondary uses of 
data generated for operational needs is vital. .Since the 
requirements placed on operational centers to accommo- 
date this secondary function are minimal, the experiences 
of NSSDC in the secondary uses of space science data will 
be discussed. With a present satellite generation rate of 
10 12 bits per year of space science information, the data 
management problems become significant, apcL consider- 
able data processing is required before maximum utiliza- 
tion of the data base can be realized. In addition, the Data 
Center must be concerned with an information system to 
handle documentation, performance data, instrument cali- 
brations and characteristics, and management informa- 
tion. 

I, Introduction 

The National Space Science Data Center (NSSDC) was 
established in 1965 with the primary function of providing 
the means for the dissemination and analysis of space 
science data beyond that provided by the original experi- 
menter. To fulfill this mission, the Data Center is re- 
sponsible for the active collection, organization, storage, 
announcement, retrieval, dissemination, and exchange of 
space science data obtained from satellite experiments, 
sound! nr rocket probes, and high-altitude aeronautical 
and balloon investigations. Thus, as can be seen from 
this description, the Data Center is not an operational 
data processing center in the strictest sense of the word. 
However, considerable data processing is required to ob- 
tain maximum utilization of the data base. For example, 
with a present satellite generation rate of 10* 2 bits of 
space science information per year, the data management 
problems become very important. 

In this context, it is critical to understand the second- 
ary uses of these data and what is involved. Both space 
science and environmental data may be collected for a 
number of reasons. On the one hand, there may be an 
operational mission that must be supported. On the other 
hand, the primary reason may be one of basic research 
in which an attempt is made to find out what is there, how 
it varies with time and space, as well as to understand its 
properties in terms of fundamental processes and princi- 
ples. Regardless of the initial reason, much of the data, 
either in the fundamental or converted form, may be very 
useful to others for entirely different reasons. Although 
the initial use of these data may have been exploited, the 
preservation of such data for secondary use is important 
for at least two reasons: (1) in many cases these data 


are very expensive to obtain — a large satellite program 
may cost millions of dollars — and it may take 4 or 5 years 
of effort on the part of one research group, and (2) the 
actual volume of such data has become so large that it 
would be impractical to publish all of it. 

To satisfy the needs of the various secondary user 
groups such as space scientists other than the principal 
investigators, scientists in related fields, engineers and 
systems planners, and educational activities; specialists 
in the various space science disciplines, systems analysis, 
computer programming, data processing, technical writing, 
publication procedures, and reproduction are required. In 
short, both an automated data processing system and an 
information system are required to handle the numerous 
magnetic tapes, cards, pictures, microfilm, and copies of 
written, graphical, and tabular materials, and to prepare 
the Data Center holdings for effective secondary use by 
others interested in conducting space science investiga- 
tions. This preparation process often involves independent 
analysis on the part of Data Center scientists to properly 
service the total user community. 

Because the requirements placed on operational centers 
to accommodate this secondary function are minimal, the 
purpose of this paper is to discuss NSSDC experiences 
with the secondary uses of space science data, the data 
processing system developed to process these data, the 
interfaces in the use of primary and secondary data, and 
the integrated information system to support the use of the 
data, current and future, at the National Space Science 
Data Center. Hopefully, these experiences will prove 
useful to those of you who are concerned with operational 
systems and will be interfacing with data centers such as 
ours in your respective disciplines. 

II. Secondary Uses of Space Science Data 

Two of the biggest problems facing those of us con- 
cerned with data processing are the tremendous amount 
of data produced by satellite measurements that must be 
processed and analyzed and the wide diversity of space 
experiments. As mentioned previously, there were well 
over one trillion bits of data transmitted by satellites 
during the last year. These data serve many different 
purposes: engineering measurements are vital to the ad- 
vancement of techniques and procedures which lead to 
more sophisticated and reliable spacecraft systems; ap- 
plications satellites are concerned with communications, 
navigation, weather, and, as you well know, with earth re- 
sources; biomedical experiments assist the manned flight 
program; and scientific experiments measure those quan- 
tities which will lead to a general understanding of the 
natural phenomena. 

One must, however, understand the different philoso- 
phies behind the collection of these categories of data. 
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The useful lifetime of engineering data is generally much 
shorter than that for scientific data, since rapid strides 
in technology soon produce new devices with different and 
more desirable properties. Likewise, operational data is 
used in near-real time for most purposes. Consequently, 
the dissemination of this type of information must occur 
rapidly. The nature of space science measurements, on 
the other hand, requires that a much larger active life- 
time be provided for the data. The processes being ob- 
served are not yet completely understood, and they inter- 
act with each other in many different ways. As a result, 
there are significant variations with time and location. 

This requires large volumes of data to obtain relations, 
patterns, and interactions. As new .Ideas develop in 
understanding the phenomena, scientists will want to take 
a new and different look at the existing data, and these 
analyses will lead to new resets which may have little 
resemblance to the original Intent of the individual experi- 
ments. Thus, this fund of new scientific knowledge will 
continue to grow. 

But much work must first be done to prepare the data 
for independent analysis in the future by other scientists. 
The first step is to collect and preserve this data in the 
proper form. Raw data transmitted directly from the 
satellite is not the best form because it is impossible to 
acquire all the supporting Information that is necessary 
for its independent use. Reduced data records are the 
basic records acquired by the Data Center. A discussion 
of the processes Involved in going from raw to analyzed 
data is given in Reference 1. However, to insure an un- 
derstanding of terms, a definition of reduced data records, 
as viewed by NSSDC, will be given at this time. 

Reduced Data Records — Data records prepared from 
raw data records by a compacting, editing, correcting, 
and merging operation performed under the supervi- 
sion of the principal investigator. Data in this form 
contain all the basic usable information obtained from 
the experiment and generally include the instrument 
responses measured as functions of time along with 
appropriate position, attitude, and equipment perform- 
ance information necessary to analyze the data in an 
independent fashion. The engineering corrections such 
as temperature, voltage, dead time, gain changes, and 
other similar corrections to the instrument response 
will have been made. Unusable noisy data and periods 
of questionable Instrument performance will have been 
removed as well as duplicate portions of information. 
Time averaging and the conversion of the instrument 
response to physical units will not have been accom- 
plished in most cases. Visual data, such as photo- 
graphs derived from data processing techniques, may 
also be considered as reduced records. 

For some scientific uses of the data, it is not neces- 
sary to reanalyze or reevaluate. In most cases the in- 
terpretations given by the original investigator are the 
most valid and respected ones. These analyzed results 
are often Incorporated with those of other experiments in 
order to gain new understanding of the various phenomena. 
For this reason, the Data Center must also collect ana- 
lyzed data records which are defined an follows. 


Analyzed Data Records — Data records prepared from 
reduced data by the principal investigator, his co- 
workers, and other space scientists which display the 
scientific results of the experiment. In general, the 
physical quantities derived from the sensor responses 
are displayed in various appropriate coordinate systems 
and correlated with other geophysical measurements. 
The results may be time averaged over meaningful in- 
tervals, displayed in the form of parameters of specific 
physical models or theories or as best-fit parameters 
of empirical descriptions. This form may include 
charts, graphs, photographs, and tables which are the 
results of data processing and analysis techniques em- 
ployed by the analyzing scientist. Examples of these 
appear in his published works, but the total number are 
usually too large to be published in their entirety. 

To assk't secondary users such as engineers or system 
planners, a new product — the space environment — must 
be developed from the raw materials which flow into the 
Data Center. Most knowledge of the space environment 
comes directly from the scientific measurements. How- 
ever, only a small segment comes from each individual 
experiment. It is necessary to study the analyzed and/or 
reduced data from many experiments in order to obtain a 
fairly comprehensive description of the space environment. 
This translation or synthesis into useful data summaries, 
compilations, or environments is a natural professional 
activity for those space scientists associated with the Data 
Center. It represents a type of analysis not done very 
often by the original investigators and contributes a useful 
product for dissemination to all user groups, including 
space scientists. (2) 

The creation and documentation of a particular model 
of some environmental parameters could be considered as 
a state-of-the art survey in a scientific field ns well as a 
useful new output. Such a new model could also serve to 
identify certain data as no longer useful. Thus, these data 
subsets could be retired from the active data base or 
purged completely. (3) 

The type and amount of data that will flow into the Data 
Center from a particular satellite will depend upon its 
mission and the number of experiments carried onboard. 
Table 1 shows the number of successful experiments flown 


Disciplines 

Successfully 

Flown 

Experiments* 

Some 
Data At 
NSSDC* 

Ionospheres & Radio Physics 

121 

25 

Planetary Atmospheres 

140 

19 

Particles and Fields 

579 

96 

Solar Physics 

102 

23 

Astronomy 

27 

3 

Planetology (Lie. Selenology) 

137 

56 

Total 

1106 

222 


*As of October 31, 1969 


Table 1. Data On Hand Vs. Successful Experiments 
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and the number in which some data has been acquired by 
NSSDC. The actual time involved In the flow from source 
to the Data Center may range from weeks in the case of 
photographs to years in the case of some satellite data. 

In the latter case, scientists must be given sufficient time 
to plan and conduct the experiment and the subsequent 
primary analysis of the data. 


space science data can be seen by the growth in the num- 
ber of users of data held in NSSDC as shown in Figure 1. 
At the present time, we are processing over 1800 requests 
per year for secondary users of space science data. More 
important, this number has been increasing at a steady 
pace, and, according to all indications, will continue to 
climb. 


These data can arrive at the Data Center on: (a) mag- 
netic tapes, (b) microfilm, (c) microfiche, (d) photographic 
positives or negatives, (e) graphs and roll charts, (!) com- 
puter-generated plots, or (g) printed materials. To give 
you an idea of the data at NSSDC, Table 2 shows the 
growth of the data base at NSSDC . More will be said on 


Medium 

August 

1967 

August 

1969 

Sheets and Bound Volumes, Sheets 

175,000 

284,000 

Digital Magnetic Tapes, 1/2" x 2400' 

291 

7,426 

Microfilm, 100-Ft Rolls 

7,800 

12,224 

Photographic Films: 



9-1/2" Width, Unear Feet 

14,000 

18,000 

70-mm Width, Unear Feet 

33,200 

222,250 

35 -mm Width, Unear Feet 

0 

759,500 

4 x 5 inch, Each 

2,100 

3,639 

8 x 10 Inch, Each 

0 

400 

16 x 20 Inch, Each 

93 

93 

20 x 24 Inch, Each 

2,200 

8,005 

Photographic Prints: 



9-1/2" Width, Unear Feet 

0 

9,000 

70-mm Width, Unear Feet 

0 

22,000 

8 x 10 Inch 

600 

3,900 

11 x 14 Inch 

200 

500 

16 x 20 Inch 

93 

93 

20 x 24 Inch 

2,200 

6,120 


To meet the current and anticipated needs of secondary 
data users, the Data Center must provide a wide range of 
services. One guiding principle is that if the data cannot 
be handled by a diversified spectrum of users with a mini- 
mum of effort, they should remain with the original in- 
vestigators and be noted as available. To provide the 
necessary services, NSSDC must have the following capa- 
bilities: 

• A data processing system 

• An information system about both the data comprising 
the data base as well as the availability of specialized 
data collections that exist in other locations 

• Microfilming, digitizing, and computing equipment 
with enough flexibility to b i able to accept data in 
almost any form and be able to provide the data in a 
variety of ways so that it is readily usable by a di- 
versified user community 

• A specialized technical library and automated docu- 
ment retrieval system 

• A professional staff in the scientific disciplines that 
carries out analysis and synthesis of the data 

• A professional staff in the computer and information 
sciences that develops and upgrades information 
systems, analysis routines, and storage and retrieval 
techniques based on the latest technology. 

The complexity of the )ob to be done together with the 
huge volume of data that must be handled and processed 
required the adoption of a total systems approach and the 
automation of the Data Center. 

in. Flow of Data and Information 


Table 2. Growth of the Data Base at NSSDC 

this subject later. Of course, having the data is not 
enough. We must also have an information system which 
can retrieve facts about the data, satellite orbits, on- 
orbit performance, instrument characteristics, transmit- 
ter frequencies, and other supporting Information such as 
funding information. Access and retrieval of this infor- 
mation in a variety of ways serves the management com- 
munity as well as the other users previously described. 
The value of having such supporting information and the 
reduced data to incorporate into a university graduate 
program has been pointed out by Dcssler. (4) He stated 
that the cost of a space-hardware program could perhaps 
be reduced from about $400,000 per Fh.D. to about 
$100,000 by carrying out one or tv/o space experiments 
and supplementing this with the analysis of data obtained 
from NSSDC. Another example of the secondary use of 


It was within this framework that the Data Center 
planned and developed its current integrated Information 
and data processing system. To oversimplify the mission 
of NSSDC, one must first arrange for obtaining the space 
science data and understand the form/format of Incoming 
data. Once the data begin to arrive, there must be a cen- 
tral source of information concerning these data. This 
need is satisfied by the subsystem called Automated In- 
ternal Management (AIM). Upon arrival, one must process 
the machine-sensible data, prepare it for retrieval, and be 
able to handle special types of data in different forms and 
formats — this is done through the Machine-Oriented Data 
System (MODS). (The steps for processing nonmachine- 
sensible data are generally analogous.) In their work, the 
professionals at NSSDC must have ready access to the 
documentation relating t appropriate satellites, experi- 
ments, and data — the Technical Reference File (TRF) 
serves this purpose. Finally, statistics must be kept on 



NSSDC DATA REQUESTS 



Figure 1. Growth of Requests. As of January 1968, requests requiring machine processing wore identified apart from 
other requests, and single requests requiring different forms of data were treated as multiple requests. 


the processing and use of data, and management must 
have a variety of reports in this area — this is greatly 
facilitated by the use of the Request Accounting Status 
and History (RASH) file. These subsystems are being 
tied together to form the General Automated Internal 
Management (GAIM) system and are supported by five 
additional special-purpose files: Computer Program 
Status Report, Magnetic Tape Unit Record, Computer 
Program File, Rocket File, and Distribution File. 

To obtain a better understanding of the NSSDC system 
and to get a broad picture of what happen* during this 
process, it will be helpful to follow the path of informa- 
tion flow from the experimenter to the system. First, a 
space data scientist is assigned to each satellite/experi- 
ment/ data set, as appropriate. He then obtains advance 
prelaunch information from such sources as the satellite 
project office, news releases, bulletins, reports, and 
personal contacts. This and subsequent information is 
entered into a working acquisition file, and, at this time, 
an AIM entry is generated. The agent establishes contact 
with the experimenter and his data processing personnel 
n arrange for bringing in data and related documentation, 
ft should be pointed out that long periods of time are nor- 
mally involved between this first stage and getting the 
actual data into the NSSDC system. This usually takes 
two or more years after launch. 

Once the preliminaries are over, the acquisition sci- 
entist remains in constant contact, through visits or 
phone, with the experimenter and his data processing 
staff to help solve problems relating to the submission of 


data and documentation. Thereafter, the data and infor- 
mation come in almost on an automatic basis, except 
where special problems arise. The first items that should 
arrive at the Data Center are usually calibration curves, 
unpublished information, instrument descriptions, and 
data processing documentation. These are analyzed and 
selectively entered into the acquisition file and TRF, and 
notices are placed into the AIM file for subsequent use in 
processing incoming data and preparing publications. 

Next, the reduced data, consisting mainly of magnetic 
tapes, arrive. At this time, the acquisition agent, together 
with a programmer, as required, verify and analyze this 
data, prepare duplicates of the tapes, and prepare data set 
catalogs (Indexing) using the MODS subsystem to accom- 
plish these tasks. At this point, tape reformatting often 
has to be accomplished. The agent then feeds appropriate 
information into special-purpose files such as the tape 
and program status files. The AIM entry is brought up to 
date. (These subsystems will be discussed later.) 

The analyzed data, normally made up of plots, graphs, 
and tables, arrive quite a bit later. Of course, for older 
experiments that are not yet in NSSDC, analyzed and re- 
duced data may arrive in any order. The acquisition 
agent again goes through a similar processing cycle as in 
the case of machine- uensible data Data are verified, 
analyzed for information content, logged, indexed, copied 
or microfilmed, and the Information entered into the AIM 
information subsystem. 
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The working relationship with the experimenter Is 
beneficial for the information transfer in other ways. 
Through this association and contacts at professional 
meetings, the acquisition agent receives copies of appro- 
priate talks, reports, preprints, and published papers, as 
well as gaining a deeper understanding of the experiment 
and the implications of the data. (These items are sup* 
plemented by a thorough screening of the current litera- 
ture.) Each of these documents is carefully analyzed, 
keyworded (by the acquisition agent), and entered into the 
TRF. Appropriate Information Is extracted for entry Into 
the AIM. 

It should be clear by now that the acquisition scientist 
spends a great amount of his time studying and working 
with the data to put all the necessary information together 
so that it may be useful to others. Using the information 
from the AIM and TRF subsystems and special-purpose 
files, the acquisition agent and publications staff prepare, 
as necessary, Data Announcement Bulletins and entries 
for the Catalog of Satellite Experiment* . This 
does not necessarily mean that all data from a particular 
satellite experiment have arrived or have been completely 
processed. Many other contacts and correspondence with 
the experimenter may still be necessary. The prepara- 
tion of a Data Users 1 Note concerning a particular experi- 
ment normally occurs after the final stage of data acqui- 
sition and processing. This document shows where the 
supporting information Is, in what forms the data are 
available, what literature of previous work relating to the 
experiment is available, and offers a key Lc the ute of the 
data. 

The information flow is not complete without a men- 
tion of the RASH subsystem. The acquisition and request 
agents work through the RASH subsystem In satisfying 
users' requests on a dally basis. These requests may in- 
volve copies of data or publications, logical se rches of 
the Information files, or may even require further de- 
tailed data analysis on the part of the acquisition agent to 
help solve a particular problem. 

IV. Hie Information and Data 
Processing System 

Automated Internal Management (AIM) 

AIM, as the centralized source of Information, la the 
hub around which the other subsystems revolve. It Is 
built upon detailed descriptions of the data, experiment, 
and spacecraft, along with the status of acquisition activ- 
ity. The purposes served by the AIM subsystem are de- 
tailed in Figure 2. 

The contents of the AIM file are organized into a hier- 
archical structure. The most significant level is the 
spacecraft. Information which relates to the spacecraft 
Is included here. The second level relates to the experi- 
ment. All experiment identification, detector descrip- 
tions, and commentary about a single experiment are 
contained in this section. The third level deals with a 


AUTOMATED INTERNAL MANAGEMENT 

• LOGICAL SEARCHES TO ANSWER QUERIES 

• WORKLOAD /VOLUME OF EXPECTED DATA 

• ACTION REMINDERS 

• ACQUISITION MANAGEMl NT REPORI 

• SPACECRAFT/EXPERIMENTS/DATA SETS 

• RESPONSIBLE AGENT 

• PRIORITY 

• STAGE OF ACQUISITION 

• HOURS EXPENDED 

• CURRENT ACTIVITY 

• NEXT CONTACT 

• FILE INDEX 

• LISTING OF SPACECRAFT/EXPERIMENTS/D* r A SETS 

Figure 2. Uses of AIM. 

single data set.* These levels are generally tied together 
In tiie following manner, depending on identification of ex- 
periment and availability of data: the satellite-level entry 
will be followed by all experiment- level entries which 
pertain to that spacecraft; similarly, the data set-level 
entries are associated with the experiment. This concept 
can be perhaps better visualized by examining the typical 
AIM File Index entry shown in Figure 3. 

Within each of these levels in a full AIM entry, there 
are specified categories of information concerning person- 
nel, objectives, telemetry, Instrumentation, acquisition 
information, experiment performance, data set contents, 
etc. 

As noted earlier, AIM is also used foi providing man- 
agement Information. Based on the same levels just dis- 
cussed, detailed Information is provided concerning space- 
craft, experiment, and data set. The various categories 
of Information are explained in Reference 2. 

A few ex°tvles of the curr ent activity of the AIM file 
may be appropriate at this time. Orbital parameter data 
for 85 Russian satellites have bten coded. The AIM file 
maintenance consists of about 1,000 general changes per 
quarter, and this file is currently accounting for about 
1,200 satellites, 1,600 experiments, and 350 data sets. 

Machine-Oriented Data System (MODS) 

To be responsive to the users who request data in dig- 
ital form, aa well aa to those who provide the original data, 
NSSDC must have the flexibility to accept the data in any 
format and to provide it in any format. Since both the 
giver and taker will have restrictions imposed by their 
existing computer hardware and software, the Data Center 
facility must provide the "Impedance matching." MODS 
la used for processing the data Into the N8SDC computer- 
ized data base, for data set analysis, generation of data 
set catalogs, tape reformatting (when the Interchange of 
Information la Inhibited by the diversity of hardware), and 


•Defined aa a body of data reduced by one group of investigators In one specific way In n form, format, or organization 
which uniquely describee It. ft can be a unit of machine-sensible or nonmachine-senslhle data which cau contain one 
to several hundred magnetic tapes, rolls of film, etc. 
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EXPLORER 4 B BPS A EXPLORER 4 1958 EPSILON, 07/26/58 L'S ARMY R05A0O 46 14 
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DE 0719.0904 
DE 0820,0904 


SPACECRAFT IDENTIFICATION 
EXPERIMENT IDENTIFICATION 
DATA SET IDENTIFICATION 
SPACECRAFT NAME 
DISCIPLINE GROUP (INTERNAL CODE) 
ACQUISITION AGENT INITIALS 
PRIORITY 


0 SPACECRAFT/EXPERIMENT/DATA SET 
0 DATE OF LAUNCH 
0 AGENCY/INVESTIGATOR 
0 INTERNAL CODES 
0 DATE OF LATEST ENTRY 
0 MANAGEMENT INFORMATION UPDATE 


Figure 3. Typical AIM File Index Entry 


production of allied reports. Perhaps the best way to 
examine the composition of this subsystem is to follow 
incoming machine-sensible data sets through their proc- 
essing cycle and then look at the tape reformatting 
process. 

Processing Incoming Data Sets 

All magnetic tapes received by NSSDC are first entered 
into the storage records by filling out the proper forms 
and assigning a unique accession number. At this point, 
an acquisition scientist, to be assisted by a programmer 
as necessary, is assigned to the data set for preliminary 
analysis. 

The Joint objectives of the acquisition scientist and 
programmer in the preliminary processing are: (1) ability 
to read the entire tape in its logical format; (2) ability to 
list out any function or special record; (3) ability to de- 
tect errors (logical and physical); and (4) verification of 
the acquisition agent's understanding of the contents. 

During this preliminary processing, the programmer 
writes all the necessary routines to manipulate the data 
and reformat it, if necessary. These programs are en- 
tered into the Computer Program File. The preliminary 
analysis stage is completed when NSSDC has the ability 
to use and Interpret all data in the data set. This may 
require additional contacts with the experimenters. 

At this time, the acquisition agent and the programmer 
define the format of the data set catalog, the functions of 
which are to: 

e Provide an index to the contents of the data set 
• Provide a series of error checks 
e Calculate bounds or distributions of functions 


e Provide a useful tool to the request agent for iden- 
tifying data 

e Provide a coarse description of the Information in 
the data set 

After the requirements are defined, the programmer 
writes a program to produce the data set catalog. This 
routine should also produce a copy of the original tape or 
a reformatted version. After checking the program and 
turning it over to the computer people, the rest of the 
tapes in the data set are processed. 

Tape Reformatting 

The processing of normal machine- sensible data is 
well taken care of by using the procedures Just outlined. 
However, experience has shown that people will not use 
data if it takes a lot of time and effort to convert it to a 
format which allows for direct entry into their own com- 
puter. Consequently, one of the major problems con- 
fronting the Data Center is the processing of magnetic 
tapes produced by different computers and operating 
systems. This presents two main difficulties: the physi- 
cal problem of reading tapes which cannot be used with 
the normal hardware available to NSSDC and the logical 
problem of interpreting data where word size, word for- 
mat, data arrangement, etc., cannot be easily handled with 
standard software. The approach used is to achieve the 
desired flexibility by producing software which is bit- 
oriented rather than charac ter- or word-oriented. The 
Data Center currently has i jutines available to read tapes 
generated by a number of operating systems,* as well as 
BCD (binary coded decimal) tapes with arbitrary and vari- 
able record sizes, physically formatted Unary tapes, and 
FORTRAN-generated tapes. For achieving compatibility 
with systems using 9-track tape, NSSDC uses other com- 
puters at GSFC. The hub of the MODS subsystem is a 
package called PI FT (Package for Information Formatting 


•BESYS, APLOS, IBM-DCS operating systems. 
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and Transformation) which will accomplish the functions 
Just outlined and at the same time will produce densely 
packed machine- and media-independent data sets that 
may be accessed in the man-machine mode. A basic 
building block of this subsystem is a bit manipulator pro- 
gram which has recently been completed. Although spe- 
cial-purpose routines are available, their use la con- 
strained. PIFT, on the other hand, as a problem -oriented 
language, allows junior-level programmers to perform 
manipulations in a straightforward manner. 

The first version of PIFT has been produced and Is op- 
erational. This version uses the capabilities built into the 
computer software system to generate routines which de- 
pend on specific data. All data are processed by the 
NSSDC computer despite the fact that the tapes were orig- 
inally generated by a variety of other computers and sys- 
tems. It is capable of generalised bit manipulation and 
several forms of number conversion. In this first of two 
phases, PIFT is capable of: 

e Defining the logical content of a data record in a 
machine- Independent language 

e Producing highly compacted, self-documenting, 
machine- Independent tapes 

e Producing a homogeneous data base designed to 
facilitate direct access 

For example, we have successfully generated IBM 360 
tapes from IBM 7094 tapes. 

In phase 2, active development has already started in 
the following areas: 

1. The definition of a language to identify all data sets 
and to retrieve any information requested by the user. 


available at NSSDC, as well as that which exists in the 
published literature. The references Include published 
and unpublished documents relating to the spacecraft, ex- 
periment, or data set which are or will be preserved at 
the Data Center. 

The computer can display pertinent information In a 
variety of ways. Open (subjective) and controlled key- 
words are used to cover standard satellite/ rocket Identi- 
fication, type and content of publication, and discipline- 
oriented keywords. The methodology for describing the 
type and content of publication can be seen In Figure 4. A 
typical TRF entry Is presented i.' Figure S. 


CONTENT CODF 

0 BIBLIOGRAPHIC 

1 THEORETICAL PAPERS 

2 SCIENTIFIC PAPCNS 

EXPERIMENTAL RESULTS 

3 INSTRUMENT DESCRIPTION PAPERS 

4 DISCIPLINARY REVIEW PAPERS 

5 SATELLITE * MISSION DESCRIPTION 

* NEWS RELEASES 

7 DATA PROCESSING PAPER? 

I working papers, minutes, etc 

* DATA TABULATION 


Figure 4. Codes to Identify Publications. Code letters 
and numerals are combined to Identify both type and 
content of publication by the use of two or mure 
characters, e.g., BG 23. 


PUBLICATION CLASS 

A JOURNAL ARTICLES 
• ROOKS 

C GOVERNMENT PUBLICATIONS 

0 UNIVERSITY REPORTS 

1 INDUSTRY REPORTS 

F MAGAZINES. PRESS RELEASES. 
AND NEWSPAPER ART ICLES 

O PROCEEDINGS. SYMPOSIA 4 
OTHER COLLECTIONS 

H UNPUBLISHED 


2. The development of a set of subroutines to operate 
on the data to: 

e Perform analysis 

e Produce data set catalogs 

e Perform error analysis and data verifications 

e Produce useful data products 


To overcome the common gap between indexer-selected 
terms and user-selected terms, the acquisition agents 
themselves, who are space data scientists and the prime 
users of this subsystem, review the literature, select the 
entries, and keyword the Inputs.* Thus, each member of 
the acquisition staff devotes a portion of his time to build- 
ing up the TRF base and verifying the output. In this 
manner, up to 120 items per week are entered into the 
file, which now contains well over 4,000 entries. 


3. A study leading to the definition of an optimum 
data base structure for the data seta held at NSSDC. 

Technical Reference File (TRF) 

The Data Center professionals must have internal 
documentation support and a tool for satisfying the bib- 
liographic needs of apace science data users. This is 
why the TRF comes into the system. It provides access 
to documents used for evaluating and verifying acquired 
data and for publishing catalogs, Data Users 1 Notes , and 
bulletins. It provides a useful record of the documentstlon 


As concerns the external uses of the TRF, consider- 
able effort is presently being devoted to the production of 
a notebook-sized TRF output. Once this Is lully imple- 
mented, NSSDC will have the capability of producing space 
science bibliographies ordered by author, discipline, ex- 
periment, or spacecraft. To produce special bibliogra- 
phies upon request, a logical routine has been integrated 
Into the TRF program which allows for the usual Boolean 
logic searches of AND, OR, and NOT among the entries. 
Present snd additional keywords are also being studied 
to eventually derive a meaningful thesaurus. 


•Document storage and retrieval is based upon randomly assigned accession numbers. 
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© DISCIPLINE KEYWORD l$) 

© OPEN KEYWORD (*) 

© CLASS/TYPE PUBLICATION COOL 
© DATE FNTRY LAST AL i E R 1 1 > 


Figure 5. View of TRF Entry 


Request Accounting Status and History (RASH) 

At this point, the data from the space science experi- 
ments have been obtained, entered into the system, and 
prepa .ed for retrieval. The next step is to facilitate the 
acquisition and request agents' contacts with the users of 
these data — this need is satisfied through the RASH sub- 
system. 

This subsystem provides much valuable information. 

It is used to aid in keeping track of the progress of re- 
quests received by the Data Center and providing man- 
agement with bookkeeping information. Specifically, 

RASH is desired to display up-to-date information re- 
lating to the number of requests, their status, estimated 
and actual costs, processing .lme, and necessary action 
reminders.* This variety of information can be retrieved 
by data set, requester, affiliation, date of request, date 
filled, request agent, status of request, and so forth. 

Generalized Automated Internal Management (GAIM) 

The various subsystems Just described need to be tied 
together so that each file can be accessed. For example, 
a realistic requirement may call for information on both 
available data and documentation. GAIM will serve this 
purpose. The development of GAIM will occur in three 


phases. During phase 1, iho vaiijim sulr.ystci.is com- 
prising GAIM (AIM, TRF, RASH, etc.) are being combined 
into a single homogeneous data base. The data base will 
be structured in ouch a manner as to be responsive in an 
interactive environment. A generalized command lan- 
guage to facllr.ate file maintenance, support searches, 
and generate special reports is being developed. The 
language must be suitable for Interacting with acquisition 
scientists, data technicians, and other users of Data 
Center facilities. 

Phase 2 will center about the development of a proto- 
type Interactive analysis system operated front remote 
terminals and supported by graphic devices. Included will 
be tools for general modeling, unit conversions, standard 
transformations, statistical packages , etc. Display tech- 
niques, both on-line and off-line will be provided to assist 
the secondary data user in his research Also, a natural 
command language will be developed which is responsive 
to the scientific user untrained in computer applications/ 4; 

The third phase associated with the development of 
GAIM is the definition and specification of the next gen- 
eration system needed to support the mission of the Data 


♦This subsystem also can aid in the construction of a model of the NSSDC by supplying viable information describing 
the user community, types of requests and responses, and data sets most likely to be used. 
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Center. This Includes both hardware and software. 
Consideration will be given to: 

• CRT devices 

• Large mass storage devices (photographic, laser, 
electron) 

• Archival media (film images, inexpensive magnetic 
devices, etc.) 

• Computer-compatible microfilm equipment 

• Computer-oriented publications 

• Syntactic analyzers (to interpret natural written 
English inputs) 

• Procedures for analysis of space science data 

• Photo recognition techniques 

V. The Interfaces Between the Operational and 
Secondary Use of Data 

Up to now, we have described the secondary uses of 
space science data, the flow of data and information, and 
the information and data processing system at the National 
Space Science Data Center. No doubt you have noted many 
similarities between operational data processing centers 
and our activities. At this time, the need for both oper- 
ational and secondary use of the same data in terms of 
the big picture should be reemphasized. 

First let us recap some of the distinctions and inter- 
faces between these uses. A project is usually set up to 
generate data for a specific need. This could be an eco- 
nomic requirement in the area of weather, agriculture, 


R&D, etc. The key item is that such data are used as 
generated almost on a real-time basis to satisfy the identi- 
fied requirements. This is fine, but the costs and time in- 
volved in obtaining, processing, and analyzing these data 
dictate that they should possibly be preserved for future 
use. And as we have tried to show, there are many valid 
secondary uses for such data. 

In many instances, the major secondary users do not 
require the data per se, but, Instead, need products de- 
rived from extracting, compiling, evaluating, reformatting, 
and synthesizing the data. Such products may be charts, 
atlases, models, statistical studies of properties and phe- 
nomena, handbooks, etc. And, as mentioned, the users of 
such products may not be the scientists intimately in- 
volved in the particular discipline. More commonly, they 
include such groups as scientists in related disciplines, 
engineers and designers, planners, management, educa- 
tional activities, recreational activities, commercial ac- 
tivities, and the general public. 

It should be clear by now that there is a valid need to 
consider both uses of data. Figure 6 details the general- 
ized flow of data. A few observations may be helpful at 
this time in following this flow. First, once a data gen- 
erating program is approved, acquisition scientists 
backed by data processing specialists from the data cen- 
ter that will be acquiring the data for secondary use should 
start working with the generators during the time that data 
reduction plans are being formulated. There are a num- 
ber of reasons for this involvement. 

• The primary data generators and users may not be 
aware of all secondary data uses. 

• Data processing compatibility should be considered. 
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Figure 6. Generalized Data Flow 
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• The proper data from each experiment/ operation 
should be acquired. 

• Interface schedules should be established. 

Following the data flow in Figure 6, when the data ar- 
rive at the reduction or operational analysis point, they 
begin the cycle of serving related, but different purposes. 
When arrangements are made for sending the data to a 
data center, supporting documentation must also be col- 
lected to make these data useful for secondary uqers. It 
should also be noted that for secondary uses, there is no 
requirement for real-time links — delays are acceptable 
and normal. However, it is imperative that each center 
have some options in deferring or accepting past and 
currently available data. They should have the preroga- 
tive of determ* dng what data ai e important in terms of 
known or potential secpndary user requirements and on 
what they should expend their limited resources, i 

On the subject of resources, we do not believe that it 
is feasible for data centers to be totally self-sufficient in 
the same sense that research and development efforts are 
not. The agency responsible for the data-gath^ring pro- 
gram should provide funds for the experiment or oper- 
ational programs to make the data and necessary docu- 
mentation available to the center. Conversely, the agency 
responsible for the operation of the data center should 
fund for the internal operation and for its portion of the 
acquisition costs. Thus, it would appear to be appropri- 
ate for a fraction of the agencies' R&D and/or operational 
budget which supports the data-gathering programs to be 
used for supporting data center activities. In the case of 
space science data, about 1 % of the funds expended to 
generate the data for primary use would be sufficient to 
perform the functions and services that we have been 
discussing. 

VI, The Next Phase 

We have tried to show you the many different types of 
secondary data users, to tie together some of our com- 
mon data processing and information problems, and to 
give you some insight into the system that we have de- 
veloped and are continuing to develop. More important, 
some of our present data management problems based on 
our experiences with secondary data users point quite 
vividly to the direction we will be going in the future. In 
hopes of furthering the solution of some of Qur common 
problems, I would like to state some of our views based 
on these experiences with secondary data users. 

As the Data Center grows, so must its information 
system. It must be able to handle large varieties and 
amounts of data and prepare them for a multitude of dif- 
ferent uses. It must rely on new and better software and 
hardware to effectively perform these tasks, bearing in 
mind the guiding principle of furthering the effective use 
of data from space science experiments. The present 
system software must be upgraded with respect to proc- 
essing incoming tapes for verification of inputs and quality 
control — two goals are immediate detection of errors or 
omissions and standard maintenance and system quality 
control programs. Effective purging of the active data 
base will have to be accomplished. Consideration must 


be given to a good long-term archival medium as the 
lifetime of magnetic tape cannot compare with photo- 
graphic or printed matter, although recent tests are more 
encouraging. Time-phased data compression will be 
another vital area of concern. Considerations include 
higher density storage techniques and the actual compres- 
sion of data. This data compression can occur in various 
steps. The first step would involve the retirement of al- 
ternate forms of the data in which the most useful form 
would be retained. Then, even this most useful form of 
data could be subjected to the removal of derived vari- 
ables, which are computed from basic positional and atti- 
tude information. This would still permit recalculation 
of these variables at a later date, should this prove nec- 
essary. At this point, no reduction in the basic informa- 
tion content has occurred. However, if one wishes to use 
this data, more time and resources will have to be uti- 
lized than previously. One is balancing this cost against 
the maintenance cost of keeping all the derived bits in the 
active data base. There is a break-even point depending 
on data usage. As one starts the irreversible process of 
destroying information content, a sensible approach would 
be to separate the background information (ambient, quiet 
time) from the event information (disturbed time). This 
will permit time averaging the background information, 
say over hours or days, for subsequent use in determining 
long-term changes. A sizable reduction in the number of 
bits for a given data set will occur in this process, and 
yet the information most likely to be used in future studies 
will still be available. Clearly, data of historical signifi- 
cance would be preserved as long as possible. 

In other words, certain points should be considered 
when planning for the retirement of data. 

1. Large volume plus cost of maintenance plus fixed 

resources dictate the orderly retirement of data. 

2. Early in its life, various forms of the data are use- 
ful, e.g., time-ordered, space-ordered, etc. 

3. Data can be reduced without losing information 

content: . 

• By eliminating certain forms of data 

• By removing derived variables 

• By keeping only the significant number of bits, 
not the full computer words 

4. Information content of data can be reduced: 

• By breaking out event data from background 

• By averaging the background over suitable time 
intervals 

• By preserving only special data for historical 
purposes 

• By preserving only outstanding geophysical event 
data 



• By compressing data into analyzed iorms so that 
general understanding of phenomena is retained 

In short, data can shrink in size and in information con- 
tent, but knowledge of it never disappears from the scene. 

Some thought, as we have previously mentioned, is al- 
ready given to the next generation of the NSSDC informa- 
tion system. One consideration is to provide the Data 
Center with much greater flexibility and capability by 
developing varied analysis programs which can be readily 
applied to the data. Although complete requirements have 
not yet been defined, it is envisioned that scientists, ex- 
perimenters, and acquisition agents should be able to 
interact, on-line, through a computer, to data bases and 
data sets held at NSSDC. It is also anticipated that in this 
way the resulting dialogue between two or more scientists 
can be used to synthesize new' information in the process. 

These concepts are not too far from reality. With the 
progression of time, the central processing facilities are 
performing more work on the raw data before it is sent to 
the experimenters for analysis. In the beginning, the raw 
data was sent directly. Now, tapes are digitized and ed- 
ited, noise flags are inserted, time overlaps are removed, 
and decommutation is performed. There is interest at 
present in having the orbit and attitude information 
merged with the data before it is sent to the experimenter. 
As high-speed data links become available across the 
country, there will be no need to send the data to the ex- 
perimenter. Instead, standard processing will be per- 
formed up to the point where detailed analysis can begin. 
The data in this form could be sent directly into the Data 
Center where it could be reached via high-speed termi- 
nals and manipulated on large computers by the principal 
investigators using many standardized analysis programs. 
Special-purpose analysis programs would be constructed 
on-line by the individual users as the needs arise. At 
that point, the processing facility and the Data Center will 
have blended into one operation. There exists today an 
on-line retrieval system with a 10 12 bit capacity. (6) This 
is capable of handling a year's worth of space science 
data at the present rate of generation. Within 5 years 
10 15 bit systems seem to be feasible. This technology’ 
advancement will permit the development of a truly inter- 
active system with the whole 3pace science data base plus 
correlative ground-based measurements. It is quite clear 
that this new type of facility could be a reality in 5-10 
years. 

In closing, we would like to say that there does not 
appear to be any requirement for a monolithic data center 
or for high-speed data links among all data centers. 

There is, however, a genuine need for close coordination 
and cooperation among such data centers both now and in 
the future. This would facilitate the identification and 
solution of problem areas, the reduction of unnecessary 
overlap, and the development and spread of technological 
advances in data storage, manipulation, and retrieval. 
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